Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SELECT/INSERT/UPDATE/DELETE db operation + span names #1253

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

damemi
Copy link
Contributor

@damemi damemi commented Nov 7, 2024

Ref #743

From the semconv for db spans:

Database spans MUST follow the overall [guidelines for span names](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.37.0/specification/trace/api.md#span).

The span name SHOULD be {db.query.summary} if a summary is available.

If no summary is available, the span name SHOULD be {db.operation.name} {target} provided that a (low-cardinality) db.operation.name is available (see below for the exact definition of the [{target}](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md#target-placeholder) placeholder).

If a (low-cardinality) db.operation.name is not available, database span names SHOULD default to the [{target}](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md#target-placeholder).

If neither {db.operation.name} nor {target} are available, span name SHOULD be {db.system}.

Semantic conventions for individual database systems MAY specify different span name format.

The {target} SHOULD describe the entity that the operation is performed against and SHOULD adhere to one of the following values, provided they are accessible:

db.collection.name SHOULD be used for data manipulation operations or operations on a database collection.
db.namespace SHOULD be used for operations on a specific database namespace.
server.address:server.port SHOULD be used for other operations not targeting any specific database(s) or collection(s)
If a corresponding {target} value is not available for a specific operation, the instrumentation SHOULD omit the {target}. For example, for an operation describing SQL query on an anonymous table like SELECT * FROM (SELECT * FROM table) t, span name should be SELECT.

We don't have collection (table), namespace, or server.address readily available (right now), but we can at least parse certain operations. We can also support more over time. So this gets us at least a couple useful span names besides DB (fallback for unsupported operations)

This means we can also set db.operation.name with that same value when it is available.

@damemi damemi requested a review from a team as a code owner November 7, 2024 19:37
@damemi damemi force-pushed the sql-semconv branch 3 times, most recently from 8f0e456 to 5340675 Compare November 7, 2024 19:53
@@ -8,6 +8,8 @@ import (
"os"
"strconv"

sql "github.com/xwb1989/sqlparser"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through a couple sql parsing libraries, this isn't the most maintained... https://github.com/krasun/gosqlparser seems good but for some reason I don't think it supports * in queries (??). For what we're doing now I think it's fine but this could use another look some day

@damemi damemi force-pushed the sql-semconv branch 2 times, most recently from 89e64c7 to 7116a06 Compare November 8, 2024 19:49
@damemi damemi force-pushed the sql-semconv branch 3 times, most recently from c809878 to 97ccfaf Compare November 20, 2024 16:23

if operation != "" {
span.Attributes().PutStr(string(semconv.DBOperationNameKey), operation)
span.SetName(operation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to build the {db.query.summary} or {db.operation.name} {target} here given we have already parsed the query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally doing db.query.summary, but we don't have any of the values for {target} (db.collection, db.namespace, or server address). So, it recommends just using the operation:

If a corresponding {target} value is not available for a specific operation, the instrumentation SHOULD omit the {target}. For example, for an operation describing SQL query on an anonymous table like SELECT * FROM (SELECT * FROM table) t, span name should be SELECT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table might be defined in the query statement though, right? The query parsing library should be able to get this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I could find, unfortunately :\ The first library I tried (https://github.com/krasun/gosqlparser) does have that, which is how I was building it at first. But that library breaks when it tries to parse a * character (like SELECT * FROM). Which makes no sense to me, but that's my understanding from trying to debug it and read that library's code. I'm open to add a TODO here for this though because maybe I'm just missing something

Copy link
Contributor

@MrAlias MrAlias Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work, right?

package main

import (
	"fmt"

	"github.com/xwb1989/sqlparser"
)

// Parse takes a SQL query string and returns the parsed query statement type
// and table name, or an error if parsing failed.
func Parse(query string) (string, string, error) {
	stmt, err := sqlparser.Parse(query)
	if err != nil {
		return "", "", fmt.Errorf("failed to parse query: %w", err)
	}

	switch stmt := stmt.(type) {
	case *sqlparser.Select:
		return "SELECT", getTableName(stmt.From), nil
	case *sqlparser.Update:
		return "UPDATE", getTableName(stmt.TableExprs), nil
	case *sqlparser.Insert:
		return "INSERT", stmt.Table.Name.String(), nil
	case *sqlparser.Delete:
		return "DELETE", getTableName(stmt.TableExprs), nil
	default:
		return "UNKNOWN", "", fmt.Errorf("unsupported statement type")
	}
}

// getTableName extracts the table name from a SQL node.
func getTableName(node sqlparser.SQLNode) string {
	switch tableExpr := node.(type) {
	case sqlparser.TableName:
		return tableExpr.Name.String()
	case sqlparser.TableExprs:
		for _, expr := range tableExpr {
			if tableName, ok := expr.(*sqlparser.AliasedTableExpr); ok {
				if name, ok := tableName.Expr.(sqlparser.TableName); ok {
					return name.Name.String()
				}
			}
		}
	}
	return ""
}

func main() {
	queries := []string{
		"SELECT * FROM users WHERE id = 1",
		"SELECT id, name FROM users WHERE id = 1",
		"INSERT INTO orders (id, amount) VALUES (1, 100)",
		"UPDATE products SET price = 19.99 WHERE id = 10",
		"DELETE FROM sessions WHERE expired = true",
		"CREATE TABLE logs (id INT, message TEXT)",
	}
	for _, query := range queries {
		fmt.Println("Query: ", query)
		statement, table, err := Parse(query)
		if err != nil {
			fmt.Println("Error:", err)
			continue
		}
		fmt.Printf("Statement: %s, Table: %s\n", statement, table)
	}
}
$ go run .
Query:  SELECT * FROM users WHERE id = 1
Statement: SELECT, Table: users
Query:  SELECT id, name FROM users WHERE id = 1
Statement: SELECT, Table: users
Query:  INSERT INTO orders (id, amount) VALUES (1, 100)
Statement: INSERT, Table: orders
Query:  UPDATE products SET price = 19.99 WHERE id = 10
Statement: UPDATE, Table: products
Query:  DELETE FROM sessions WHERE expired = true
Statement: DELETE, Table: sessions
Query:  CREATE TABLE logs (id INT, message TEXT)
Error: unsupported statement type

FWIW, looking at vitess, it seems like it should be able to support more statement types in the table lookup (i.e. table create).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "vitess.io/vitess/go/vt/sqlparser":

package main

import (
	"fmt"

	"vitess.io/vitess/go/vt/sqlparser"
)

// Parse takes a SQL query string and returns the parsed query statement type
// and table name(s), or an error if parsing failed.
func Parse(query string) (string, []string, error) {
	p, err := sqlparser.New(sqlparser.Options{})
	if err != nil {
		return "", nil, fmt.Errorf("failed to create parser: %w", err)
	}

	stmt, err := p.Parse(query)
	if err != nil {
		return "", nil, fmt.Errorf("failed to parse query: %w", err)
	}

	var statementType string
	var tables []string

	switch stmt := stmt.(type) {
	case *sqlparser.Select:
		statementType = "SELECT"
		tables = extractTables(stmt.From)
	case *sqlparser.Update:
		statementType = "UPDATE"
		tables = extractTables(stmt.TableExprs)
	case *sqlparser.Insert:
		statementType = "INSERT"
		tables = []string{stmt.Table.TableNameString()}
	case *sqlparser.Delete:
		statementType = "DELETE"
		tables = extractTables(stmt.TableExprs)
	case *sqlparser.CreateTable:
		statementType = "CREATE TABLE"
		tables = []string{stmt.Table.Name.String()}
	case *sqlparser.AlterTable:
		statementType = "ALTER TABLE"
		tables = []string{stmt.Table.Name.String()}
	case *sqlparser.DropTable:
		statementType = "DROP TABLE"
		for _, table := range stmt.FromTables {
			tables = append(tables, table.Name.String())
		}
	case *sqlparser.CreateDatabase:
		statementType = "CREATE DATABASE"
		tables = []string{stmt.DBName.String()}
	case *sqlparser.DropDatabase:
		statementType = "DROP DATABASE"
		tables = []string{stmt.DBName.String()}
	case *sqlparser.TruncateTable:
		statementType = "TRUNCATE TABLE"
		tables = []string{stmt.Table.Name.String()}
	default:
		return "UNKNOWN", nil, fmt.Errorf("unsupported statement type")
	}

	return statementType, tables, nil
}

// extractTables extracts table names from a list of SQL nodes.
func extractTables(exprs sqlparser.TableExprs) []string {
	var tables []string
	for _, expr := range exprs {
		switch tableExpr := expr.(type) {
		case *sqlparser.AliasedTableExpr:
			if name, ok := tableExpr.Expr.(sqlparser.TableName); ok {
				tables = append(tables, name.Name.String())
			}
		}
	}
	return tables
}

func main() {
	queries := []string{
		"SELECT * FROM users WHERE id = 1",
		"SELECT id, name FROM users WHERE id = 1",
		"INSERT INTO users (id, name) VALUES (1, 'Alice')",
		"UPDATE users SET name = 'Bob' WHERE id = 1",
		"DELETE FROM users WHERE id = 1",
		"CREATE TABLE users (id INT, name VARCHAR(100))",
		"ALTER TABLE users ADD COLUMN age INT",
		"DROP TABLE users",
		"CREATE DATABASE test_db",
		"DROP DATABASE test_db",
		"TRUNCATE TABLE users",
	}

	for _, query := range queries {
		fmt.Println("Query: ", query)
		statement, tables, err := Parse(query)
		if err != nil {
			fmt.Printf("Error parsing query: %s\nQuery: %s\n\n", err, query)
			continue
		}
		fmt.Printf("Statement: %s, Tables/DBs: %v\n", statement, tables)
	}
}
$ go run .
Query:  SELECT * FROM users WHERE id = 1
Statement: SELECT, Tables/DBs: [users]
Query:  SELECT id, name FROM users WHERE id = 1
Statement: SELECT, Tables/DBs: [users]
Query:  INSERT INTO users (id, name) VALUES (1, 'Alice')
Statement: INSERT, Tables/DBs: [users]
Query:  UPDATE users SET name = 'Bob' WHERE id = 1
Statement: UPDATE, Tables/DBs: [users]
Query:  DELETE FROM users WHERE id = 1
Statement: DELETE, Tables/DBs: [users]
Query:  CREATE TABLE users (id INT, name VARCHAR(100))
Statement: CREATE TABLE, Tables/DBs: [users]
Query:  ALTER TABLE users ADD COLUMN age INT
Statement: ALTER TABLE, Tables/DBs: [users]
Query:  DROP TABLE users
Statement: DROP TABLE, Tables/DBs: [users]
Query:  CREATE DATABASE test_db
Statement: CREATE DATABASE, Tables/DBs: [test_db]
Query:  DROP DATABASE test_db
Statement: DROP DATABASE, Tables/DBs: [test_db]
Query:  TRUNCATE TABLE users
Statement: TRUNCATE TABLE, Tables/DBs: [users]

@@ -27,6 +29,9 @@ const (

// IncludeDBStatementEnvVar is the environment variable to opt-in for sql query inclusion in the trace.
IncludeDBStatementEnvVar = "OTEL_GO_AUTO_INCLUDE_DB_STATEMENT"

// IncludeDBOperationEnvVar is the environment variable to opt-in for sql query operation in the trace.
IncludeDBOperationEnvVar = "OTEL_GO_AUTO_INCLUDE_DB_OPERATION"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this OTEL_GO_AUTO_PARSE_DB_STATEMENT instead? Give we are going to use the parsed semantics for more than just the operation it might help better convey the intention.

@@ -22,6 +22,7 @@ require (
github.com/hashicorp/go-version v1.7.0
github.com/pkg/errors v0.9.1
github.com/stretchr/testify v1.9.0
github.com/xwb1989/sqlparser v0.0.0-20180606152119-120387863bf2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead use the code this was based on: https://github.com/vitessio/vitess/tree/main/go/vt/sqlparser

It seems like this was a fork of that repository1, but it hasn't been maintained or synced since that fork.

Would it be too large of a dependency to just rely on vitess directly?

Footnotes

  1. https://github.com/xwb1989/sqlparser?tab=readme-ov-file#notice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about using vitess directly, but it seemed like a lot more than what we needed. Since this is just an implementation detail, we could refactor it later if we want but I think this is the best option for right now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned when looking through all the changes that have been applied to the upstream code over the past 6 years, there's a considerable amount of changes. Notably:

Should we make our own fork of this pacakge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe get a better understanding of dependency size by looking at what the binary size is based on the possible dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit it looks like you've dug into it deeper than I did, hah. These are some great points so maybe that is the better approach. I'll try switching it to that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants